Systems, methods and computer products for profile based identity verification over the internet

ABSTRACT

Systems, methods and computer products for profile-based identity verification over the Internet. Exemplary embodiments include a system including an activity classifier configured to receive Internet activity input including email, chat, browser and voice over Internet Protocol (VoIP) logs/streams, an email profiler, a chat, a browser profiler, a voice over Internet Protocol (VoIP) logs/streams profiler, wherein the profilers are configured to extract values from the Internet Activity input attributes from the data set, a score calculator configured to receive the attributes and calculate the score of the data set, a categorization engine configured to receive the score from the score calculator and map the data set to an individual or class of individuals based on the value of the score and on a database of activity-specific attributes and an application configured to place weights on the activity specific and generic attributes to define a score function from the score.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.11/969,569, filed Jan. 4, 2008, the disclosure of which is incorporatedby reference herein in its entirety.

TRADEMARKS

IBM® is a registered trademark of International Business MachinesCorporation, Armonk, N.Y., U.S.A. Other names used herein may beregistered trademarks, trademarks or product names of InternationalBusiness Machines Corporation or other companies.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to classifying network activity and particularlyto systems, methods and computer products for profile-based identityverification over the Internet.

2. Description of Background

Individuals all over the world interact with the Internet throughdifferent types of activities (e.g., applications, protocols, services).The behavioral dynamics of an individual in a particular Internetactivity environment may be significantly different from otherindividuals. Each Internet activity can be characterized by a set ofattributes that can be used to define features of the behavior of anindividual while interacting with Internet through that activity. Forexample, attributes associated with Email can be: the community of thepersons to which emails are normally sent; the time stamp of the emails;the length of emails; the type of attachments (doc/ppt/mpeg . . . ); thesubject of emails; the topic generally discussed; and the keywordsnormally used by a person (e.g., each person has his own set ofvocabulary from which they normally choose words to write in emails).Furthermore, those attributes associated with Chat can be: the type ofchat community a person joins; the language used in chat environments;the occurrence rate of chat messages; the amount of time a user poisesbetween sending messages; the length of chat message in terms of numberof words; the type of community according to the time of the day; thereaction time to messages from others; the amount of time a personspends in particular chat community; and the number of concurrent chatsessions an individual participates.

Every individual has a certain personality that is a complexmanifestation of the social, political, economical and educationalbackground in which he was brought up and in which he currently resides.The word “personality” here is a broad term including an individual'sintelligence level, creativity, vocabulary, interests, linguisticskills, psychological traits, experience with using computerapplications, mannerisms. This personality is reflected in hisday-to-day interactions with others, in his thinking, and hence in hisactions in different environments and in different situations. Anindividual's personality also has a crucial affect on his behavior overInternet. In particular, this personality can be reflected in the valuesof the different internet-activity specific attributes for theindividual.

SUMMARY OF THE INVENTION

Exemplary embodiments include a system for profiling a user on a networkbased on a data set to generate a score, the system including anactivity classifier configured to receive Internet activity inputincluding email, chat, browser and voice over Internet Protocol (VoIP)logs/streams, an email profiler coupled to the activity classifier, achat profiler coupled to the activity classifier, a browser profilercoupled to the activity classifier, a voice over Internet Protocol(VoIP) logs/streams profiler coupled to the activity classifier, whereinthe profilers are configured to extract values from the InternetActivity input for activity specific and generic attributes from thedata set, a score calculator configured to receive the activity specificand generic attributes and calculate the score of the data set, acategorization engine configured to receive the score from the scorecalculator and map the data set to an individual or class of individualsbased on the value of the score and on a database of activity-specificattributes and an application configured to place weights on theactivity specific and generic attributes to define a score function fromthe score, wherein the categorization engine is further configured togenerate a dynamic profile of the data-set based on a score functiongenerated by the application and to generate a dynamic category from thedatabase based on the score function.

System and computer program products corresponding to theabove-summarized methods are also described and claimed herein.

Additional features and advantages are realized through the techniquesof the present invention. Other embodiments and aspects of the inventionare described in detail herein and are considered a part of the claimedinvention. For a better understanding of the invention with advantagesand features, refer to the description and to the drawings.

TECHNICAL EFFECTS

As a result of the summarized invention, technically we have achieved asolution which provides profile-based identity verification over theInternet.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter which is regarded as the invention is particularlypointed out and distinctly claimed in the claims at the conclusion ofthe specification. The foregoing and other objects, features, andadvantages of the invention are apparent from the following detaileddescription taken in conjunction with the accompanying drawings inwhich:

FIG. 1 illustrates an exemplary embodiment of a system for profile-basedidentity verification over the Internet;

FIG. 2 illustrates a high level block diagram of a system in accordancewith exemplary embodiments;

FIG. 3 illustrates a block diagram of a hierarchy of Netmetrics™ inaccordance with exemplary embodiments; and

FIG. 4 illustrates a flow chart of a method of profiling a user on anetwork in accordance with exemplary embodiments.

The detailed description explains the preferred embodiments of theinvention, together with advantages and features, by way of example withreference to the drawings.

DETAILED DESCRIPTION OF THE INVENTION

Exemplary embodiments include systems and methods that define, measureand analyze sets of attributes of an individual in an internet activityenvironment, which can be implemented for verifying identity. Inexemplary embodiments, the systems and methods further classifyindividuals based on these attributes ascribed to different Internetactivities. In exemplary embodiments, similar to biometrics whichimplement physical or behavioral characteristics (including fingerprints, retina, DNA, voice patterns etc.), the attributes defined hereinare based on behavioral patterns on the Internet, (i.e. “Netmetrics™”).

In exemplary embodiments, the systems and methods described hereinprofile an individual based on his behavior over Internet usingdifferent activity-specific metrics and further identify an individualbased on feeds from his internet activities using differentactivity-specific metrics. In exemplary embodiments, profiling involvesdefining a vector of attributes corresponding to different internetactivities and then estimating the values of these attributes for anindividual. The profiling can be dynamic and as the values of attributeschange over time, the individual profiles are also updated. Once arepository of individual profiles is established a mapping ofpackets/group of packets to an individual/group of individuals based onthe values of attributes carried by these packets can be performed. Theattribute values can be determined by some statistical processing ofpackets for example, which can involve machine learning techniques likesupervised learning (Neural networks, Linear Discriminant Analysis) orunsupervised learning techniques.

In exemplary embodiments, the attributes can be unique to an activityand/or independent of the activity and may just depend on the individualand/or specific to a class of activities. For example, attributesspecific to Email and Chat activities are defined above. Theactivity-independent attributes can be linguistic skills, typing speedetc. Examples of attributes specific to a class of activities can bee.g., conversation reaction time of an individual, which may be similarin VoIP and chat environments.

In exemplary embodiments, the systems and methods described herein canbe implemented by companies for profiling its employees, which can beused e.g., to identify inappropriate usage of company's networkresources by non-employees (friends, spouses etc.). The systems andmethods described herein can also be implemented by the government formonitoring Internet for suspicious activities. The systems and methodsdescribed herein can also be implemented to prevent identify theft,monitor surreptitious activities, and conduct studies on social behaviorover Internet.

In exemplary embodiments, the systems described herein can include adatabase storing activity-specific attributes. In exemplary embodiments,the attributes can be learned over time corresponding to an (e.g.,activity, individual) pair. The database can be updated dynamically withnew information received. The systems described herein can also includean activity Classifier. In exemplary embodiments, the activityclassifier classifies the data received into the type of activity towhich it corresponds. The systems described herein can also include adata-set profiler, which studies different activity logs in run-timecorresponding to individual(s) and calculates values for different(predefined) activity specific attributes from the logs. In exemplaryembodiments, during run-time feeds may be from only a subset ofactivities and further some activity-specific attributes may not becalculated due to the time horizon of feeds, etc. The systems describedherein can also include a data-set mapper to map the particular data-setwhich was analyzed by the profiler to different (e.g., predefined anddynamically updated) categories of individuals.

FIG. 1 illustrates an exemplary embodiment of a system 100 forprofile-based identity verification over the Internet. The methodsdescribed herein can be implemented in software (e.g., firmware),hardware, or a combination thereof. In exemplary embodiments, themethods described herein are implemented in software, as an executableprogram, and is executed by a special or general-purpose digitalcomputer, such as a personal computer, workstation, minicomputer, ormainframe computer. The system 100 therefore includes general-purposecomputer 101.

In exemplary embodiments, in terms of hardware architecture, as shown inFIG. 1, the computer 101 includes a processor 101, memory 110 coupled toa memory controller 115, and one or more input and/or output (I/O)devices 140, 145 (or peripherals) that are communicatively coupled via alocal input/output controller 135. The input/output controller 135 canbe, for example but not limited to, one or more buses or other wired orwireless connections, as is known in the art. The input/outputcontroller 135 may have additional elements, which are omitted forsimplicity, such as controllers, buffers (caches), drivers, repeaters,and receivers, to enable communications. Further, the local interfacemay include address, control, and/or data connections to enableappropriate communications among the aforementioned components.

The processor 105 is a hardware device for executing software,particularly that stored in memory 110. The processor 105 can be anycustom made or commercially available processor, a central processingunit (CPU), an auxiliary processor among several processors associatedwith the computer 101, a semiconductor based microprocessor (in the formof a microchip or chip set), a macroprocessor, or generally any devicefor executing software instructions.

The memory 110 can include any one or combination of volatile memoryelements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM,etc.)) and nonvolatile memory elements (e.g., ROM, erasable programmableread only memory (EPROM), electronically erasable programmable read onlymemory (EEPROM), programmable read only memory (PROM), tape, compactdisc read only memory (CD-ROM), disk, diskette, cartridge, cassette orthe like, etc.). Moreover, the memory 110 may incorporate electronic,magnetic, optical, and/or other types of storage media. Note that thememory 110 can have a distributed architecture, where various componentsare situated remote from one another, but can be accessed by theprocessor 105.

The software in memory 110 may include one or more separate programs,each of which comprises an ordered listing of executable instructionsfor implementing logical functions. In the example of FIG. 1, thesoftware in the memory 110 includes the profile-based identityverification methods described herein in accordance with exemplaryembodiments and a suitable operating system (O/S) 111. The operatingsystem 111 essentially controls the execution of other computerprograms, such the profile-based identity verification systems andmethods described herein, and provides scheduling, input-output control,file and data management, memory management, and communication controland related services.

The profile-based identity verification methods described herein may bein the form of a source program, executable program (object code),script, or any other entity comprising a set of instructions to beperformed. When a source program, then the program needs to betranslated via a compiler, assembler, interpreter, or the like, whichmay or may not be included within the memory 110, so as to operateproperly in connection with the O/S 111. Furthermore, the profile-basedidentity verification methods can be written as an object orientedprogramming language, which has classes of data and methods, or aprocedure programming language, which has routines, subroutines, and/orfunctions.

In exemplary embodiments, a conventional keyboard 150 and mouse 155 canbe coupled to the input/output controller 135. Other output devices suchas the I/O devices 140, 145 may include input devices, for example butnot limited to a printer, a scanner, microphone, and the like. Finally,the I/O devices 140, 145 may further include devices that communicateboth inputs and outputs, for instance but not limited to, a NIC ormodulator/demodulator (for accessing other files, devices, systems, or anetwork), a radio frequency (RF) or other transceiver, a telephonicinterface, a bridge, a router, and the like. The system 100 can furtherinclude a display controller 125 coupled to a display 130. In exemplaryembodiments, the system 100 can further include a network interface 160for coupling to a network 165. The network 165 can be an IP-basednetwork for communication between the computer 101 and any externalserver, client and the like via a broadband connection. The network 165transmits and receives data between the computer 101 and externalsystems. In exemplary embodiments, network 165 can be a managed IPnetwork administered by a service provider. The network 165 may beimplemented in a wireless fashion, e.g., using wireless protocols andtechnologies, such as WiFi, WiMax, etc. The network 165 can also be apacket-switched network such as a local area network, wide area network,metropolitan area network, Internet network, or other similar type ofnetwork environment. The network 165 may be a fixed wireless network, awireless local area network (LAN), a wireless wide area network (WAN) apersonal area network (PAN), a virtual private network (VPN), intranetor other suitable network system and includes equipment for receivingand transmitting signals.

If the computer 101 is a PC, workstation, intelligent device or thelike, the software in the memory 110 may further include a basic inputoutput system (BIOS) (omitted for simplicity). The BIOS is a set ofessential software routines that initialize and test hardware atstartup, start the O/S 111, and support the transfer of data among thehardware devices. The BIOS is stored in ROM so that the BIOS can beexecuted when the computer 101 is activated.

When the computer 101 is in operation, the processor 105 is configuredto execute software stored within the memory 110, to communicate data toand from the memory 110, and to generally control operations of thecomputer 101 pursuant to the software. The profile-based identityverification methods described herein and the O/S 111, in whole or inpart, but typically the latter, are read by the processor 105, perhapsbuffered within the processor 105, and then executed.

When the systems and methods described herein are implemented insoftware, as is shown in FIG. 1, it the methods can be stored on anycomputer readable medium, such as storage 120, for use by or inconnection with any computer related system or method. In the context ofthis document, a computer readable medium is an electronic, magnetic,optical, or other physical device or means that can contain or store acomputer program for use by or in connection with a computer relatedsystem or method. The profile-based identity verification methodsdescribed herein can be embodied in any computer-readable medium for useby or in connection with an instruction execution system, apparatus, ordevice, such as a computer-based system, processor-containing system, orother system that can fetch the instructions from the instructionexecution system, apparatus, or device and execute the instructions. Inexemplary embodiments, a “computer-readable medium” can be any meansthat can store, communicate, propagate, or transport the program for useby or in connection with the instruction execution system, apparatus, ordevice. The computer readable medium can be, for example but not limitedto, an electronic, magnetic, optical, electromagnetic, infrared, orsemiconductor system, apparatus, device, or propagation medium. Morespecific examples (a non-exhaustive list) of the computer-readablemedium would include the following: an electrical connection(electronic) having one or more wires, a portable computer diskette(magnetic), a random access memory (RAM) (electronic), a read-onlymemory (ROM) (electronic), an erasable programmable read-only memory(EPROM, EEPROM, or Flash memory) (electronic), an optical fiber(optical), and a portable compact disc read-only memory (CDROM)(optical). Note that the computer-readable medium could even be paper oranother suitable medium upon which the program is printed, as theprogram can be electronically captured, via for instance opticalscanning of the paper or other medium, then compiled, interpreted orotherwise processed in a suitable manner if necessary, and then storedin a computer memory.

In exemplary embodiments, where the profile-based identity verificationmethods are implemented in hardware, the profile-based identityverification methods described herein can implemented with any or acombination of the following technologies, which are each well known inthe art: a discrete logic circuit(s) having logic gates for implementinglogic functions upon data signals, an application specific integratedcircuit (ASIC) having appropriate combinational logic gates, aprogrammable gate array(s) (PGA), a field programmable gate array(FPGA), etc.

In exemplary embodiments, one or more processes in the memory 110 canmonitor activity from the keyboard 150 and the mouse 155 or acombination thereof. The processes can further monitor long-running jobsthat have been initiated on the computer 101. The processes can furthermonitor which and how many other machines can control the computer 101either locally or remotely. In exemplary embodiments, the processes canalso inquire or accept a grace period input by a user of the computer101. The grace period can be a time period after which all traffic toand from the computer ceases if no further activity has been sensed bythe processes. In this way, if a user has left the computer 101 for anextended period of time or has left the computer (e.g., after a workday) the computer 101 no longer allows traffic to and from the computer101. In an alternative implementation, the computer 101 can totallypower down after the grace period has expired. In further exemplaryembodiments, the processes can accept traffic only from a common networkmaintenance control system that provides limited services.

FIG. 2 illustrates a high level block diagram of a system 190 inaccordance with exemplary embodiments. In exemplary embodiments, thesystem 190 is utilized when the categorization of a data-set involvesstudying cross-activity correlations of attributes and calculating auser-specified score function. An input 205 includes streamscorresponding to different Internet activities, including email, chat,browser and voice over Internet Protocol (VoIP) logs/streams. Inexemplary embodiments, the system 190 can also include a user portalhosted by the portal server, 206, through which the users can specifytheir own specific score function and their own list of attributes to bemonitored. The input streams from the input 205 are classified using anactivity classifier, 200, and for each Internet activity there is acorresponding a profiler, 201, which acts upon the input data-set andextracts values for the activity specific and generic attributes fromthe data-set. The attributes are then fed to a score calculator, 202,whose function is to calculate the score of the data-set under analysis.A cost function can also be programmed by the user through the portal.Once the score is calculated it is fed to a categorization engine, 204,which maps the data-set to a particular individual/class of individualsbased on the value of the score and the database of activity specificattributes, 103.

In exemplary embodiments, the score is a utility function and can bedefined differently by applications. For example, applications that aremore interested in identifying individuals based on the types ofweb-sites of interest by a user can put more weights on the types ofweb-pages visited under the browser activity, on the types of web-pagesdiscussed under the email and chat activity and 0 weights on otherattributes of these activities. Some other application that ismonitoring (e.g., a chat site) can put more weights on differentattributes of chat activity logs and 0 on other activity logs. The scorefunction used to calculate the score of the particular data-set may alsoexploit correlation of (common) attributes across activities

In exemplary embodiments, the categorization engine 204 generates adynamic profile of the data-set based on the application-specific scorefunction. The categorization engine 204 also creates dynamic categoriesfrom the database based on the score function supplied by theapplication. The results from the categorization engine 204 can be fedinto applications 210 tracking and/or monitoring users. The scorefunction can also be a vector of values corresponding to differentindividual attributes or can be a vector of functions, each mapping asubset of attributes. Though individual attributes alone may not besufficient to identify an individual as the attribute set of manyindividuals may overlap, the combined set of attributes across differentInternet activities has a high probability of drilling-down to anindividual. An individual can be viewed as a point in amulti-dimensional space of attributes associated with Internetactivities. As richer sets of attributes for an activity and estimatedvalues for an individual are defined, the ability to identify theindividual uniquely also increases.

FIG. 3 illustrates a block diagram of a hierarchy 300 of Netmetrics™ inaccordance with exemplary embodiments. Cross layer Netmetrics™ 300 caninclude certain layers of a TCP/IP stack such as an application layer320, and corresponding applications 325, a transport layer 330, andcorresponding data 335, and a network layer 24 o, and correspondingnetwork applications 345. It is thus appreciated that the systems andmethods described herein can be defined and evaluated at differentlayers of the network.

FIG. 4 illustrates a flow chart of a method 400 of profiling a user on anetwork in accordance with exemplary embodiments. At block 410, an inputof streams corresponding to network activities associated with the useris received, wherein the input of streams is received from one or morelayers of the network. At block, 420 in response to receiving a requestto supply specified-input, a score function and a list of attributes tobe monitored is received. At block 430, the input of streams isclassified into network-activity classifications. At block 440, valuesand attributes for the network-activity classifications are extractedand placed into data sets. At block 450, a score of the data sets iscalculated. In exemplary embodiments, the score is a utility functiondefined by applications. At block 460, the data sets are compared to adatabase of activity-specific attributes. At block 470, the data setsare mapped to a class of individuals based on a value of the score andthe comparison of the database of activity-specific attributes. Inexemplary embodiments, the method 400 can further include generating adynamic profile of the data set based on an application-specific scorefunction. In further exemplary embodiments, the method 400 can furtherinclude obtaining activity logs associated with the network activities,analyzing the activity logs in run-time and calculating values for theactivity-specific attributes from the activity logs.

The capabilities of the present invention can be implemented insoftware, firmware, hardware or some combination thereof.

As one example, one or more aspects of the present invention can beincluded in an article of manufacture (e.g., one or more computerprogram products) having, for instance, computer usable media. The mediahas embodied therein, for instance, computer readable program code meansfor providing and facilitating the capabilities of the presentinvention. The article of manufacture can be included as a part of acomputer system or sold separately.

Additionally, at least one program storage device readable by a machine,tangibly embodying at least one program of instructions executable bythe machine to perform the capabilities of the present invention can beprovided.

The flow diagrams depicted herein are just examples. There may be manyvariations to these diagrams or the steps (or operations) describedtherein without departing from the spirit of the invention. Forinstance, the steps may be performed in a differing order, or steps maybe added, deleted or modified. All of these variations are considered apart of the claimed invention.

While the preferred embodiment to the invention has been described, itwill be understood that those skilled in the art, both now and in thefuture, may make various improvements and enhancements which fall withinthe scope of the claims which follow. These claims should be construedto maintain the proper protection for the invention first described.

1. A system for profiling a user on a network based on a data set togenerate a score, the system consisting of: an activity classifierconfigured to receive Internet activity input including email, chat,browser and voice over Internet Protocol (VoIP) logs/streams; an emailprofiler coupled to the activity classifier; a chat profiler coupled tothe activity classifier; a browser profiler coupled to the activityclassifier; a voice over Internet Protocol (VoIP) logs/streams profilercoupled to the activity classifier, wherein the profilers are configuredto extract values from the Internet Activity input for activity specificand generic attributes from the data set; a score calculator configuredto receive the activity specific and generic attributes and calculatethe score of the data set; a categorization engine configured to receivethe score from the score calculator and map the data set to anindividual or class of individuals based on the value of the score andon a database of activity-specific attributes; and an applicationconfigured to place weights on the activity specific and genericattributes to define a score function from the score, wherein thecategorization engine is further configured to generate a dynamicprofile of the data-set based on a score function generated by theapplication and to generate a dynamic category from the database basedon the score function.